AMENDMENTS 

In the Claims 

The following is a marked-up version of the claims with the language that is underlined 
(" ") being added and the language that contains strikethrough (" — ") being deleted: 



1 . (Currently Amended) A method comprising: 

training an email system for determining spam, where training includes at least the 
following: 

roco i v i ng an oma il mossago having a word; 
retrieving a first email message: 

generating a phonetic equivalent of the at least one word from a body portion the 
email message; 

tokenizing the phonetic equivalent of the word to generate a token representative 
of the phonetic equivalent; 

tokenizing at least one word in a subject line of the first email message; 

tokenizing at least one simple mail transfer protocol (SMTP) email address 
associated with the first email message; 

tokenizing at least one domain name associated with the first email message; 

tokenizing at least one attachment of the first email message, wherein tokenizing 
the at least one attachment includes generating a 128-bit MD5 hash of the attachment, 
appending a 32-bit length of the attachment to the generated MD5 hash resulting in a 160-bit 
number, and UUencoding the resulting 160-bit number; 

determining a spam probability from the generated tokon; tokens; 

in response to dotorm i n i ng a determination that the spam probability from the 
generated tokon, tokens indicates that the first email message is likely spam: 

determining whether the generated tokens are present in a database of 
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tokens; 

in response to a determination that at least one of the generated tokens is 
not present in the database of tokens, assigning whether the token ex i sts i n a probability value 
for each token as spam and adding the token and assigned probability value to the database of 
tokens; and 

in response to determ i n i ng a determination that the token ex i sts js 
present in the database of tokens, updating a probability value of the token; a«# 

in response to determ i n i ng a determination that the spam probability from the 
generated token, tokens, indicates that the first email message is not likely spam: 

determining whether the generated tokens are present in a database of 

tokens; 

in response to a determination that at least one of the generated tokens is 
not present in the database of tokens, assigning a probability value for each token indicative of 
non-spam and adding the token and assigned probability value to the database of tokens; and 

i n response to determ i n i ng that the token does not ex i st i n the database of 
tokens, ass i gn i ng a probab ili ty va l ue i nd i cat i ve of spam to the token- 
in response to a determination that the token is present in the database of 
tokens, updating a probability value of the token; 

sorting the generated tokens in accordance with the corresponding determined 
spam probability value; and 

filtering a second email message according to the training. 

2. (Previously Presented) The method of claim 1 , wherein generating the phonetic 
equivalent of the word comprises: 

identifying a string of characters, the string of characters including a non-alphabetic 
character; and 
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removing the non-alphabetic character from the string of characters. 



3. (Previously Presented) The method of claim 2, wherein removing the non- 
alphabetic character comprises: 

locating a non-alphabetic character within the string of characters, the non-alphabetic 
character being at least one selected from the group consisting of: 

" (quote); 

' (single quote); 

! (exclamation mark); 

@ (at); 

# (pound); 
$ (dollar); 

% (percent); 

A (caret); 

& (ampersand); 

* (asterisk); 

( (open parenthesis); 
) (close parenthesis); 
_ (underscore); 
- (hyphen); 
+ (plus); 
= (equal); 
\ (backslash); 
/ (slash); 

? (question mark); 
(space); 
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(tab); 

[ (open square bracket); 

] (close square bracket); 

{ (open bracket); 

} (close bracket); 

< (less than); 

> (greater than); 

, (comma); 

: (colon); 

; (semi-colon); 

and . (period). 

4. (Previously Presented) The method of claim 1 , wherein determining the spam 
probability comprises: 

assigning a spam probability value to the token; and 

generating a Bayesian probability value using the spam probability value assigned to the 

token. 

5. (Previously Presented) The method of claim 4, wherein determining the spam 
probability further comprises: 

comparing the generated Bayesian probability value with a predefined threshold value. 

6. (Previously Presented) The method of claim 5, wherein determining the spam 
probability further comprises: 

categorizing the email message as spam in response to the Bayesian probability value 
being greater than the predefined threshold. 



5 



7. (Previously Presented) The method of claim 5, wherein determining the spam 
probability further comprises: 

categorizing the email message as non-spam in response to the Bayesian probability 
value being not greater than the predefined threshold. 

8. (Currently Amended) A system comprising: 

means for receiving an email message having a word; word and an attachment: 
means for generating a phonetic equivalent of the word from the email message; 
means for tokenizing the phonetic equivalent of the word to generate a token 
representative of the phonetic equivalent; aftd 
means for tokenizing the attachment; 

means for determining a spam probability from the generated tokon. token; and 
means for sorting the generated tokens in accordance with the corresponding 
determined spam probability value. 

9. (Currently Amended) A system comprising: 
a processor; and 

a memory, the memory storing: 

receive logic configured to receive an email message having a wefd^ word and 
an attachment; 

phonetic logic configured to generate a phonetic equivalent of the word from the 
email message; 

first tokenize logic configured to tokenize the phonetic equivalent of the word to 
generate a token representative of the phonetic equivalent; aft4 

second tokenize logic configured to tokenize the attachment; 
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spam-determination logic configured to determine a spam probability from the 
generated tokon. tokens; and 

sorting logic configured to sort the generated tokens in accordance with the 
corresponding determined spam probability value. 

10. (Previously Presented) The system of claim 9, the memory further storing: 
string-identification logic configured to identify a string of characters, the string of 

characters including a non-alphabetic character; and 

character-removal logic configured to remove the non-alphabetic character from the 
string of characters. 

1 1 . (Previously Presented) The system of claim 1 0, the memory further storing: 
spam-probability logic configured to assign a spam probability value to the token; and 
Bayesian logic configured to generate a Bayesian probability value using the spam 

probability value assigned to the token. 

12. (Previously Presented) The system of claim 11, the memory further storing: 
compare logic configured to compare the generated Bayesian probability value with a 

predefined threshold value. 

13. (Previously Presented) The system of claim 12, the memory further storing: 
spam-categorization logic configured to categorize the email message as spam in 

response to the Bayesian probability value being greater than the predefined threshold. 

14. (Previously Presented) The system of claim 12, the memory further storing: 
spam-categorization logic configured to categorize the email message as non-spam in 
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response to the Bayesian probability value being not greater than the predefined threshold. 



15. (Currently Amended) A computer-readable medium compr i s i ng: that includes a 
program that, when executed by a computer, causes the computer to perform at least the 
following: 

a processor; and 

a memory, tho memory stor i ng: 

computer roadab l o codo adapted to instruct a programmab l e dev i ce to receive an email 
message having a word; word and an attachment; 

computer roadab l o codo adapted to instruct a programmab l e dev i ce to generate a 
phonetic equivalent of the word from the email message; 

computer roadab l o codo adapted to instruct a programmab l e dev i ce to tokenize the 
phonetic equivalent of the word to generate a token representative of the phonetic equivalent; 

tokenize the attachment; 

computer roadab l o codo adapted to i nstruct a programmab l e dev i ce to determine a 
spam probability from the generated token, token; and 

sort the generated tokens in accordance with the corresponding determined spam 
probability value. 

16. (Currently Amended) The computer-readable medium of claim 15, tho memory 
further stor i ng: the program further causing the computer to perform at least the following: 

computer roadab l o codo adapted to i nstruct a programmab l e dev i ce to identify a string 
of characters, the string of characters including a non-alphabetic character; and 

computer roadab l o codo adapted to i nstruct a programmab l e dev i ce to remove the non- 
alphabetic character from the string of characters. 
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17. (Currently Amended) The computer-readable medium of claim 1 5, tho memory 
further stor i ng: the program further causing the computer to perform at least the following: 

computer roadab l o codo adapted to i nstruct a programmab l e dov i co to assign a spam 
probability value to the token; and 

computer roadab l o codo adapted to i nstruct a programmab l e dov i co to generate a 
Bayesian probability value using the spam probability value assigned to the token. 

18. (Currently Amended) The computer-readable medium of claim 17, tho memory 
further stor i ng: the program further causing the computer to perform at least the following: 

computer roadab l o codo adapted to instruct a programmab l e dov i co to compare the 
generated Bayesian probability value with a predefined threshold value. 

19. (Currently Amended) The computer-readable medium of claim 18, th o memory 
further stor i ng: the program further causing the computer to perform at least the following: 

computer roadab l o codo adapted to instruct a programmab l e dov i co to categorize the 
email message as spam in response to the Bayesian probability value being greater than the 
predefined threshold. 

20. (Currently Amended) The computer-readable medium of claim 18, tho memory 
further stor i ng: the program further causing the computer to perform at least the following: 
computer roadab l o codo adapted to i nstruct a programmab l e dov i co to categorize the email 
message as non-spam in response to the Bayesian probability value being not greater than the 
predefined threshold. 
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